Secure multimedia exchange using voice biometric based security system for intellectual protection

Ghada Alhudhud, Duaa Alsaeed, Reem Alsaeed

In light of the digital transformation requirements, there are increasing demand of the significance of secure exchange of multimedia content across various digital channels. Among the commonly exchanged media are video and audio content. Multimedia secure exchange implies content protection and secure retrieval. The proposed method explores providing biometric authentication for multimedia content. Biometric authentication is based on creating a complex watermark with both voice biometric, and secret image. The voice biometric is analyzed using Mel Frequency Cepstral Coefficient $(MFCC)$ for feature extraction. MFCC coefficients are processed by the Gaussian mixture model (GMM) to produce the means matrix μs, covariance arrays of the Gaussians for all clusters in the voice signal, and negative log energy that represents the entropy. Accordingly, a unique voiceprint template is created. Hence, the watermarking will be obtained by embedding both the biometric template and the secret image in a given multimdia file. The embedding process will be performing lifting wavelet transform (LWT). LWT categorizes video frame content into special frequency bands such as low/low LL, Low/High LH, High/Low $HL$, and High/High HH particular frequency bands. The proposed method requires a) Creating the complex watermark composed of voiceprint template and secret image. b) Embedding the complex watermark using singular value decomposition (SVD) of the second level of LWT into the user video using the LL band of the cover video frames and the HH band of watermark video frames. For testing the methodology, five videos with different proprieties were chosen and the university logo was used as the secret image watermarking. In addition, a dataset was created during this research; containing voice recordings samples for male and female participants between the ages of 15 to 60 years old. The recorded audio clips were for given phrases in Arabic and English languages and the average of each clip is about 2.5 seconds. The total number of play counts in this experiment is 260 as we have five videos tested with 13 audio clips for accepted reads (False/True), and another 13 audio clips for rejected reads (False/True). Results showed in low False Accepted Rate of (9.2%), low False Rejection Rate of (12%), high True Acceptance Rate of (95.3%), and high True Rejection Rate of (92%). Based on seven matrices evaluations, we found points to improve the performance and accuracy of biometric authentication systems for video content protection. The verification process successful in distinguishing the original video with the original watermark from the tampered video using different voiceprints.Expermineation results exhibit the uniqueness of the complex watermark and verified the proposed method withstand different media processing attacksbased on the various voiceprints.

Advanced Studies: Euro-Tbilisi Mathematical Journal, Vol. 16,  supplement issue 2 (2023), pp. 175-199